Introduction to Movement Modeling


Preliminaries

  • Install these packages in R: {ctmm}, {curl}, {rlist}, {dplyr}.

Objectives

The objective of this module is to discuss the function and application of animal movement models, specifically, Contiuous-Time Movement Models (ctmm), and explore measures of overlap.


A guy walks into a…. CTSP?


Random Walk Models

You are at the bar, and you just got you and your friends a third round of drinks. You turn around and the dance floor is packed like sardines, and your friends are at the very front next to the DJ booth. How do you get from where you are to where you’re going? Maybe you take a step to the left, two steps forward, another back and to the right. Each step you take is independent from the last, as the dance floor is moving, grooving and no one is staying still. This goes on (and on and on) until you have made your way back to your friends.

Some movement ecologists would consider this navigation an example of a random walk model. Random walk models assume that in a time period, an individual will take random and independent steps that are identically distributed in size and away from their previous position (Nau 2014). However, these steps do depend on the location of your previous step, with each step the starting point of your next.. that is - unless you’ve figured out teleportation.



Correlated Random Walk Models

Now, imagine it’s the morning after your night out and you decide to run off the yucky post-drinking feelings. After lacing up your tennis shoes, you hit the esplanade and begin running your usual route along the Charles. Your strides are relatively the same direction, uniformly distributed, and full of nauseating regret (Codling et al. 2018).

We can consider this directionally persistent example a correlated random walk model (CRW). This means that there is a directional bias to your travel. This is a great model to use for animal movement, as animals tend to move in one direction- forward (Codling et al. 2018).

CRWs were considered the standard for modeling movement data for a long time. However, such models cannot handle multi-scale autocorrelation and disproportionately represent sampling schedule rather than underlying meaningful movement processes (Calabrese et al. 216). These shortcomings lead to inconsistent and biased results. Continuous-time stochastic process models (CTSP models) were created as an alternative to CRW models and resolve these issues.



Continuous-Time Stochastic Process Models

Okay. Now, you’ve made it home from your run and you want to see the data on your fitness tracker app. But, OH NO! You must have accidentally turned it on last night when you were leaving the bar. It tracked your uber ride home, when you got out of bed to get a glass of water, when you laid on the bathroom floor for a while, and then the run. That is a lot of movement data!

This, like any GPS monitoring system, is an example of continuous-time stochastic process modeling. That is, this is a model that accounts for finely sampled and continuous modern data collection (Calabrese et al. 2016).

CRW was so focused on the sampling schedule that it neglected continuous autocorrelated movement. CTSP models, on the other hand, are able to separate these factors while taking both into consideration. CTSP models are able to accommodate a range of data characteristics including irregular sampling schedules and varying levels of autocorrelation. However, CTSP modeling can be statistically complex and lack of comprehensive software to process this kind of data made it relatively inaccessible.

Fortunately, the continuous-time movement modeling (ctmm) package for R was created. ctmm allows a broad array of movement related data analyses including diagnostic data visualization and model fitting. These parameters can be used to analyze various metrics related to animal movement. In this module, we will examine evidence of range residency, calculate home range size with confidence intervals, and visualize the overlap between the home ranges of two coyotes.

Spatial Autocorrelation

‘Everything is related to everything else, but near things are more related than distant things’ - Waldo R. Tobler

As we have discussed in class previously, statistical analyses often rest on the assumption that data are independent and identically/normally distributed. However, we have also explored how such discrete independence is often not the case (like with our introduction to regression modeling). Investigating autocorrelation, the similarity of near observations, can reveal patterns and relationships that can better inform the selection of appropriate statistical analyses.

Spatial data have a strong tendency to be dependent, meaning near values are more/less similar than expected compared to randomly associated pairs of observations. When dealing with spatial data, like the animal movement data we are exploring here, spatial autocorrelation is a critical concept that drives model fitting and analysis.

We will continue to explore this concept in greater detail as we dive deeper into the components of ctmm modeling which help ensure our spatial models are statistically meaningful.

ctmm & Movebank

Recently there has been a rapid development and advancement of the technology associated with movement ecology (Calabrese et al. 2016). The discrete-time correlated random walk, or CRW, had been the standard tool utilized for the statistical analysis of this data. Yet, this methodology is not precise and often confounds the movement process thus producing different results even when the exact same path is being sampled. This is because CRW reflects the sampling schedule more than it does the actual process of movement (Calabrese et al. 2016). The package ctmm solved this problem via the use of a continuous-time stochastic process (Calabrese et al. 2016).

Movebank is a free online tool that allows individuals to store and archive animal movement data to be used by other scientists and lay people alike (Mrozewski 2018). This website is an especially useful tool for cross species comparisons, as well as comparisons over time, and climate/ecological change (Kranstauber et al. 2011). You can explore the website and various publicly available datasets for yourself here!

Conservation Implications

Understanding animal movement is crucial when studying population ecology, disease spread, the flow of genetic material, and conservation among other things (Calabrese et al., 2016). In so far as conservation goes, this kind of data allows us to understand how animals are using their environments, and what can be considered their home ranges. With this information, we can designate certain areas as protected, create corridors and encourage reforestation of degraded areas.



Movement Data Analysis


As shown in the background literature, advancements in movement modelling have allowed for more accurate home range estimates that account for both the time dimension and autocorrelation in global position system (GPS) data. Below is an example of the implementation of the R package {ctmm} (Contnuous-Time Movement Modeling) to calculate the autocorrelated kernel density estimates (AKDE) of coyotes (Canis latrans) from publicly available movement data shared on Movebank (Mahoney & Young, 2017). These data were collected via Lotek GPS collars (Model GPS3300S; Lotek, Newmarket, ON, Canada) on 8 coyotes in Idaho between 2004-2015. If you would like to also play around with this massive data set, take a look at the data repository on Movebank.


Data Configuration

We will begin by loading in a subset of the Mahoney & Young 2017 data. For the purpose of this module, we are only using data from ~ 1 month of the study (January 2005). The GPS collars used in the study recorded data every five minutes for over ten years and no one likes code that takes days to run :’)

Let’s first load our packages and then use {curl} to save the Movebank GPS data into the new object bigdata

library(ctmm)
library(curl)
f <- curl("https://raw.githubusercontent.com/vzdanowicz/AN588_ctmm/main/thisisit.csv")
bigdata <- read.csv(f, header = TRUE, sep = ",", stringsAsFactors = FALSE)


The {ctmm} package was designed explicitly to go hand in hand with Movebank. For any ctmm function to operate, data must be in Movebank format and coerced into a telemetry object.

Telemetry objects in {ctmm} contain information about the movement being analyzed including GPS location and sampling schedule. Objects from the {move} package in R can also be utilized in ctmm by using the as.telemetry function. Once in the proper format, movement data can be manipulated and visualized for data characterization and analysis.

coyotes <- as.telemetry(bigdata)

# Changing the names of the coyote IDs to make coding easier!
names(coyotes)<-c('coyA','coyB','coyC','coyD','coyE','coyF','coyG','coyH')


Data Visualization

As with any new data - it is important to explore!

Plotting and visualizing raw movement tracks is a great way to assess your data as a whole. In initial visualization you can detect obvious migratory behaviors, directional patterns, or perhaps even erroneous data points that require attention. This step can also aid in data selection for modeling.

Let’s visualize…

par(mfrow = c(1, 1))
plot(coyotes, col = rainbow(length(coyotes)))
title("All Coyotes")


From this plot, we can see pretty distinct ranges for the 8 individuals and there do not appear to be any unnatural patterns or random points that could skew future analysis.


Though you can use ctmm models to analyze cumulative data from an entire population, for the sake of time, we are going to build our model using the data for a single coyote, coyG.

Let’s begin by pulling out all of the movement data for this individual to create and then plot the object coy1

coy1 <- coyotes$coyG
plot(coy1)


We can see from the fairly tight clustering of GPS points, that this coyote appears to be range resident with no obvious migrations. The next step of our workflow is to calculate and plot variograms from the GPS data for this individual.



Variograms


Variograms are plots of the semi-variance between positions with varying time-lags, providing an unbiased way to visualize the autocorrelation of our data. The structure of the variogram provides critical information on the movement of an animal, as well as what method of modeling will fit the data best.

Arguably, the most crucial element of a variogram is if the semi-variance reaches an asymptote. This indicates that an animal is range-resident, which allows our range calculation to be meaningful.

The code below employs the function variogram() to calculate and then plot the variogram of telemetry object, coy1.


vg.coy1<-variogram(coy1)

#plotting long/short lag variograms
par(mfrow = c(1, 2))
plot(vg.coy1)
title("Long Lag")
plot(vg.coy1,fraction=0.005)
title("Short Lag")


The asymptote of the variogram for coy1 at longer (day) lags supports our visual analysis that coy1 is range-resident, while the slight upward curve of the short (minutes) lags show evidence of directional persistence in the data.

Variograms are used to determine the initial parameter guesstimates which are then utilized in maximum likelihood estimation, which we will describe in more detail later on.



CHALLENGE 1


Follow the steps outlined above to calculate and plot the variograms for a different individual, coyH.

  • What do the variograms for this coyote suggest?
  • How can we interpret the overall picture of the first variogram?
  • What does a more zoomed in variogram show us?
coy2 <- coyotes$coyH
plot(coy2)
vg.coy2<-variogram(coy2)
par(mfrow = c(1, 2))
plot(vg.coy2)
plot(vg.coy2,fraction=0.005)

Initial Parameter Estimates


We can then use either variogram.fit() or ctmm.guess() to generate our initial parameter guesses/estimates. These variables will later be passed to ctmm.select or ctmm.fit to better assess the model.

If run in your console, variogram.fit() allows you to visually assess and adjust parameters via the manipulate gear in top left corner of the plot pane.

NOTE: Though you can save these parameters for later use in model fitting by selecting ‘save to GUESS’ from manipulate, the code below already includes the creation of object GUESS without needing to save the variogram.fit() parameters from the plot pane.

variogram.fit(vg.coy1)
GUESS<-variogram.fit(vg.coy1)


We can also use ctmm.guess() and the variogram for coy1, vg.coy1, to estimate initial model parameters.


coy1_GUESS <- ctmm.guess(coy1,variogram = vg.coy1,interactive=FALSE)


These parameters, GUESS and coy1_GUESS, are critical in our ability to assess the most suitable model type for the data at hand. We will next explore how ctmm.select and ctmm.fit allow us to test and visualize potential model types.




Model Selection


In order to use autocorrelated kernel density estimation (AKDE) to perform accurate home range analyses, we have to ensure that our data indicates range residency and then choose the best model to represent the data. The first step as discussed above is to use a variogram to visualize whether movement is confined to a given space or home range. Then, a movement model is chosen.

The independent identically distributed (IID) process assumes each data point is independent and random. However, animal movement is automatically correlated since one’s location in space is dependent on continuous movement (i.e. coyotes also can’t teleport). This means the IID model is not a good model for our purposes despite it’s use in older kernel density estimation metrics. The brownian motion (BM) process assumes random movement and diffusion in unlimited space and is good for data not comprehensive enough to demonstrate home range or velocity autocorrelation. Typically, BM processes are not suitable for our purposes. However, the Ohrnstein-Uhlenbeck (OU) process uses BM but assumes fidelity to a general location and is suitable for data which shows signs of residency. What this model lacks is the ability to incorporate autocorrelated velocities. The integrated OU (IOU) process solves this problem for data with high resolution but is unable to demonstrate range residency. All of these shortcomings are solved in the Ohrnstein-Uhlenbeck Foraging (OUF) process which combines OU and IOU processes to enable analysis of data demonstrating both velocity autocorrelation and range residency.

This is especially relevant for improved data sampling methods which are able to combine long sampling periods and high resolution of sampling points. Given the extensive dataset used here, we can expect that our best fit movement model might be the OUF model. We also have to differentiate between the accuracy of isotropic versus anisotropic models for each movement model. By running ctmm.fit and ctmm.select, we can produce data to determine the best fit model. Once we have decided between isotropic and anisotropic, we can proceed to visualize those model types to confirm the results of the initial model guess.


Comparing variogram.fit & ctmm.guess


Let’s input the initial parameter estimates calculated from variogram.fit (GUESS) and ctmm.guess (coy1_GUESS) to compare the two methods.

vg.fitted.mods <- ctmm.select(coy1,CTMM = GUESS,verbose = TRUE) #can take 5+ mins to run 
guess.fitted.mods <- ctmm.select(coy1,CTMM = coy1_GUESS,verbose = TRUE) #can take 5+ mins to run
summary(vg.fitted.mods)
summary(guess.fitted.mods)

Both methods yield the same model fittings. Thus - either are acceptable for initial parameter guessing.

We can see from the summaries above that the anisotropic version of OUF is the ideal model for our data. We can also see that the anisotropic versions of each model were favored over their isotropic counterpart (we are looking for lowest AIC values). We can now visually examine the fits of the anisotropic versions of OU and OUF models.


Assessing Model Types


To do this, we must first extract out of the fitted anisotropic versions of the OU and OUF models

NOTE: as shown above, vg.fitted.mods could also be used to pull out the model types

OU<-guess.fitted.mods[[4]]
OUF<-guess.fitted.mods[[1]]


ctmm.fit accepts a ctmm object as well as the guesses generated by variogram.fit/ctmm.guess (in our case, OU and OUF) and generates values for comparing the fit of each model. These values include: point estimates, confidence intervals, and AICc. Based on this output, the user can select the most appropriate model for their data. We explain some of these parameters as we go along.

M.OU <- ctmm.fit(coy1,OU) #these fits may take ~2 mins
M.OUF <- ctmm.fit(coy1,OUF)
FITS <- list(OU=M.OU,OUF=M.OUF)
summary(FITS)

AICc is the (linearly) corrected Akaike information criteria. AIC balances likelihood against model complexity in a way that is good if we want to make optimal predictions. A lower AIC is better, thus the OUF model seems better here.

The fit parameter DOF[mean] is the number of degrees of freedom worth of data we have to estimate the stationary mean parameter, assuming that the model is correct.

par(mfrow = c(2, 2))
plot(vg.coy1, CTMM=OU, col.CTMM = '#1b9e77')
title("OU")
plot(vg.coy1, CTMM=OU, col.CTMM = '#1b9e77', fraction = 0.005)
title("OU")
plot(vg.coy1, CTMM=OUF, col.CTMM = '#1b1e77')
title("OUF")
plot(vg.coy1, CTMM=OUF, col.CTMM = '#1b1e77', fraction = 0.005)
title("OUF")

We can see from the figures above that OUF is a better fitted model (though not perfect!). The data we are using here are a subset of a much larger body of data. We reduced the number of observations so that the ctmm models would run in a reasonable amount of time, but it is likely that the OUF model would look even better fitted if more observation points were included. Additionally, even though the OUF model is not perfect in the shorter time lag plot, the overall trend over multiple days (longer time lag) is adequate.

par(mfrow = c(1, 2))
plot(vg.coy1,CTMM=FITS,col.CTMM=c("red","purple"),fraction=0.65,level=0.5)
title("zoomed out")
plot(vg.coy1,CTMM=FITS,col.CTMM=c("red","purple"),fraction=0.05,level=0.5)
title("zoomed in")

We can see the OU (pink) and OUF (purple) models plotted here against the variogram we created earlier. The zoomed in figure shows the purple line, the OUF model, follows the variogram more closely compared to the OU model.


Model Fitting


The previous steps revealed the OUF model is best suited for our data. We can now use ctmm.fit to fit our model via maximum likelihood. This function processes a telemetry object and model specification (i.e. OUF, OU, IID, etc.), returning the maximum likelihood parameter and interval estimates. The objects returned by ctmm.fit will allow us to compare the fitted model to the data.

this can take ~2 mins to run

coy1_FIT <- ctmm.fit(coy1, CTMM = OUF, method = "pHREML", control = list(method="pNewton",cores=2), trace = TRUE)
summary(coy1_FIT)

NOTE: CTMM = OUF based on our previous figures/calculations which show that is the best fit model compared to the OU model. Additionally, the argument trace = TRUE provides a progress report in the console window as ctmm.fit runs.

CHALLENGE 2?

variogram.fit(vg.coy2)
coy2_GUESS <- ctmm.guess(coy2,variogram = vg.coy2,interactive=FALSE)
coy2_SELECT <- ctmm.select(coy2,CTMM = coy2_GUESS,verbose = TRUE)
summary(coy2_SELECT)
OU2<-coy2_SELECT[[4]]
OUF2<-coy2_SELECT[[1]]
M.OU2 <- ctmm.fit(coy2,OU2)
M.OUF2 <- ctmm.fit(coy2,OUF2)


FITS2 <- list(OU=M.OU2,OUF=M.OUF2)
summary(FITS2)
par(mfrow = c(2, 2))
plot(vg.coy2, CTMM=M.OU2, col.CTMM = '#1b9e77')
title("OU")
plot(vg.coy2, CTMM=M.OU2, col.CTMM = '#1b9e77', fraction = 0.005)
title("OU")
plot(vg.coy2, CTMM=M.OUF2, col.CTMM = '#1b1e77')
title("OUF")
plot(vg.coy2, CTMM=M.OUF2, col.CTMM = '#1b1e77', fraction = 0.005)
title("OUF")

Which method fits the model best as suggested by the AICc? What do our plots show us about whether OU or OUF best fits the movement data?

CHALLENGE 2


coy2_FIT <- ctmm.fit(coy2, CTMM = M.OUF2, method = "pHREML", control = list(method="pNewton",cores=2), trace = TRUE)
summary(coy2_FIT)

Autocorrelated Kernel Density


After confirming our fitted continuous-time movement model, we use this as the input in calculating Autocorrelated Kernel Density Estimates AKDE. We use the AKDE estimator to produce ecologically relevant metrics, such as home range size, range overlap, and occurrence distributions. It is essential to select the best fitting movement model to produce accurate AKDE metric estimates. Plotting the AKDE estimate defaults to show the 95% home range contour and corresponding 95% confidence intervals. The summary produces point estimates and confidence intervals. Other confidence intervals can be specified if desired.

Recall the methods we went over in Module 19 - Introduction to GIS and Spatial Analysis in R. Kernel density estimates (KDE) utilize the frequency of points to calculate the density of space usage areas utilized by an individual. However, kernel density estimation (KDE) utilizes a model that assumes independent and identically distributed data, or IID. As we have discussed, GPS locational data inherently is impacted by autocorrelation, which leads KDE methods to underestimate home range area. Fleming et al. (2017) lay out a new method of kernel density estimation, AKDE, as one that removes these biases and also can work with less than ideal sample sizes. Before calculating the autocorrelated density estimates (AKDE) for each of these individuals it is worthwhile to understand exactly what is happening “under the hood” statistically.

Conventional Kernel Density

More specifically, kernel density estimation works by using the movement data to create small kernels, which are a weighted function in non-parametric statistics that define the distribution of similar points around a given point x, with k(x,y) representing the similarity of point x given another point y. This bandwidth, or covariance, is denoted as σB. The average of these kernels are then used to estimate of the probability density function (PDF) p. KDE are best calculated with the optimal bandwidth, as this reduces the mean integrated square error (MISE) between p and . Fleming et al. (2015) estimate MISE in order to acheive optimal bandwidth as follows:

Here, σ0 represents the covariance of locations and σB, as discussed, represents the bandwidth.

Autocorrelated Kernel Density

For uncorrelated data, this method works best at estimating the MISE, but further changes must be made to better estimate error with autocorrelated data. To do this, Fleming et al.’s (2015) new estimator uses the same steps but relaxes the assumption of independently and identically distributed data (IID). Here, the minimum occurs at the optimal bandwidth. This new estimator for MISE is defined as follows:

Here γ(τ) is the semi-variance function (SVF), and n(τ) is the number of location pairs with time lag t between them. This function achieves optimal performance as autocorrelation increases, but operates efficiently for uncorrelated data as well by converging with the previously discussed function. Now we can see why we put so much work into analyzing autocorrelation via variograms! All of our previous model fitting is used to calculate the AKDE here, which incorporates autocorrelation to calculate a much more accurate kernel density estimation that reduces much of the problematic biases otherwise present.


AKDE, Overlap, & CDE for Coyotes


both.coyotes <- coyotes[c("coyG","coyH")]

#variograms

variograms <- list(vg.coy1, vg.coy2)
plot.vg <- lapply(1:2, function(i) plot(variograms[[i]],fraction = 0.5))
GUESS_2 <- list(guess.fitted.mods, coy2_GUESS)
FITS_2 <- list(coy1_FIT, coy2_FIT)

par(mfrow = c(2, 2))
plot(variograms[[1]],CTMM=FITS_2,col.CTMM=c("purple"),fraction=0.65,level=0.5)
plot(variograms[[1]],CTMM=FITS_2,col.CTMM=c("purple"),fraction=0.05,level=0.5)
plot(variograms[[2]],CTMM=FITS_2,col.CTMM=c("blue"),fraction=0.65,level=0.5)
plot(variograms[[2]],CTMM=FITS_2,col.CTMM=c("blue"),fraction=0.05,level=0.5)
names(FITS_2) <- names(both.coyotes[1:2])

AKDE corrects for biases due to autocorrelation, small effective sample size, and irregular sampling in time

UDS_coyotes <- akde(both.coyotes[1:2],FITS_2)
plot(UDS_coyotes[[1]])

summary(UDS_coyotes[[1]])

plot(UDS_coyotes[[2]])

summary(UDS_coyotes[[2]])

This function calculates a useful measure of similarity between distributions evaluate overlap of 95% Home Range.The choice method=“BA” computes the Bhattacharyya’s affinity (BA). Values range from zero (no overlap) to 1 (identical Utilization Distributions).

95% level of confidence

overlap(UDS_coyotes)

plotting overlap

#use the plot function to look at the AKDE UD
#95% Home Range (95% is the default)
#The middle contour represent the maximum likelihood area where the animal spends 95% of its time.
plot(both.coyotes, UD = UDS_coyotes, col=rainbow(length(both.coyotes))) 

Coyote P4_2 (in red) and coyote P5_2 (in blue) are tracked coyotes in different packs (4 and 5). We can see how these two packs separate themselves by the little overlap and clear separation of their space usage.

We can also estimate the CDE (conditional distribution of encounters) (Noonan et al., 2020). CDE is a concept describing the long-term encounter location probabilities for movement within home ranges.

CDE_coyotes <- encounter(UDS_coyotes)

plot(both.coyotes,
     col=c("#FF0000", "#F2AD00"),
     UD=CDE_coyotes,
     col.DF="#046C9A", 
     col.grid = NA)

Literature Cited

C, J. M., Fleming, C. H., & Gurarie, E. (2016). ctmm: an r package for analyzing animal relocation data as a continuous-time stochastic process. Methods in Ecology and Evolution, 7(9), 1124–1132.

Codling, E. A., Plank, M. J., & Benhamou, S. (2008). Random walk models in biology. Journal of The Royal Society Interface, 5(25), 813–834. https://doi.org/10.1098/rsif.2008.0014

Fleming, C. H., Fagan, W. F., Mueller, T., Olson, K. A., Leimgruber, P., & Calabrese, J. M. (2015). Rigorous home range estimation with movement data: a new autocorrelated kernel density estimator. Ecology, 96(5), 1182-1188.

Fleming, C. H., & Calabrese, J. M. (2017). A new kernel density estimator for accurate home‐range and species‐range area estimation. Methods in Ecology and Evolution, 8(5), 571-579.

Kranstauber, B., Cameron, A., Weinzerl, R., Fountain, T., Tilak, S., Wikelski, M., & Kays, R. (2011). The Movebank data model for animal tracking. Environmental Modelling & Software, 26(6), 834-835.

Mrozewski, T. (2018). Movebank. Bulletin-Association of Canadian Map Libraries and Archives (ACMLA), (158), 24-27.

Nau, R. (2014, November 4). Notes on the random walk model - people.duke.edu. Notes on the random walk model. Retrieved 2021, from https://people.duke.edu/~rnau/Notes_on_the_random_walk_model--Robert_Nau.pdf